Our project examines the impact of the COVID-19 pandemic on different aspects of the economy such as GDP and unemployment rate. We decided to focus our analysis on four states that had varying coronavirus restrictions.
#Economy VisualizationThe global pandemic this past year has impacted many aspects of our normalcy and daily lives. We utilized the Twitter Developer API to better understand how people across the world were thinking about the current state of our economy, by gathering the 600 most recent tweets published with #economy and creating a word cloud to visualize our findings. The top words in the word cloud are “covid”, “business”, followed by “jobs”, “government”, and “globalhealth” to name a few.
We predict a linear correlation between unemployment rates and number of cases. As the number of COVID-19 cases increased, local and state governments entered lockdowns and further quarantine restrictions which impacted business and services, leading to higher levels of unemployment. In contrast, we hypothesize that an increasing number of deaths may not have had an equivalently strong correlation with unemployment rates. There are a few reasons for this: (1) cases spread and are recorded more quickly, (2) the number of cases is drastically higher than number of deaths, (3) there seemed to be a long-term misconception that only the elderly and immunocompromised (who might represent a smaller subset of the general population) were at high risk for severe loss due to the virus.
In terms of GDP per State, we have chosen California, Texas, Rhode Island, and Utah to examine in detail. We predict that California might have experienced an increase in GDP, due to the number of technology companies residing in the State of California who reported record amounts of sales during the shift to a remote, virtual world. We briefly researched this topic with the California State Assembly. California and Texas each have large populations and were at the top of the charts at distinct points of time for cases/deaths, but followed drastically contrasting responses to the pandemic in terms of lockdowns, mask mandates, etc. We hypothesize that California would have seen an increase or at least continuation of GDP from previous years whereas Texas might have experienced a decline. Rhode Island and Utah are closely related in population amounts, however, Utah was the one of the only states to not issue a lockdown order. Specific counties in Utah with larger populations or tourist sites did issue lockdown orders, whereas Rhode Island’s governor issued a stay at home order in late March itself. We briefly researched stay at home orders through the New York Times. We predict that Rhode Island will experience a steady GDP in 2020 whereas Utah might experience a decline due to tourist locations and highly populated areas (with the highest job density) were the only ones experiencing lockdowns.
| variable | description |
|---|---|
U.S. COVID-19 Cases |
Number of COVID-19 positive cases recorded each day in each state; this data has been wrangled into monthly national data from January 2020 to March 2021. |
U.S. COVID-19 Deaths |
Number of deaths from COVID-19 recorded each day in each state; this data has been wrangled into monthly national data from January 2020 to March 2021. |
U.S. Unemployment Rate per Month |
National recorded unemployment rates on a monthly basis from January 2020 to March 2021. |
U.S. GDP Percent Change per Year |
National monthly change in unemployment rates on a monthly basis from January 2020 to March 2021. |
Industry |
Arts, Health, Food, Eeal Estate, and Technical. |
States |
California, Texas, Utah, and Rhode Island. |
# load twitter library, and all other libraries
library(rtweet)
library(ggplot2)
library(dplyr)
library(tidytext)
# name of created app
appname <- "economic-impact-of-covid19"
## api key
key <- "LDVYl0gTvhvYbNnsiJd3IrTMk"
## api secret
secret <- "HnBymm96BOg1SYZKxvgURKc7ULJ8MKE0Fnj0ZjFfWqey7e34FQ"
# create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = key,
consumer_secret = secret,
access_token = '2620473211-WWfgOYF0UiZM0yLw3K0T3WxlG5a2cyH5heHimhC',
access_secret = 'Z6LWeugQ0mi7ANivINfUvWndSkT0ic3gceF0YQn2MC4jY')
data <- search_tweets("#economy", n = 600, include_rts = FALSE)
head(data$text)
## [1] "With increase in inflation rate and consistent interest rate, real interest rate of #Pakistan remains negative.\n\n#economy #KSE100 #interestrates #inflation https://t.co/dvvfm9lLmY"
## [2] "#cybercrime, #Darkside gang now target #StockMarket organizations. @RecordedFuture #CyberSecurity experts: The #Ransomware group will notify info to crooked market traders in advance, so they can short a company’s #stockprice. #infosec #economy https://t.co/mgVypG2IwX"
## [3] "“This graph shows the amount that the #German Central Bank has lent to the @ecb . This amount is owed mainly by #Spain & #Italy. It clearly shows the risk of a breakout in the #Eurozone At any moment there will be a group that demands to end this.” #economy https://t.co/zEdVFaSf5J"
## [4] "#AntiHomeless #architecture under #Capitalism vs under #Socialism (well, under #Communism is #housing totally for free, but yeah, in a #Socialist #economy still exist #money)\n\n#homelessness \n#socialissues https://t.co/Xn0nxU496D"
## [5] "Wealth Morning<U+2600><U+FE0F>May 3rd,2021\nBreaking News from PAK & Around the World\n\n<U+0001F53A>#Pakistan‘s #exports crossed $2 Billion for 7 months<U+0001F1F5><U+0001F1F0>\n<U+0001F53A>#UAE 2020 #economy shrank 6.1% amid #COVID19 <U+0001F1E6><U+0001F1EA>\n<U+0001F53A>#Asia #stocks to slow start; #WallStreet extended its #bullrun <U+0001F4C8>\n...\nhttps://t.co/hpBo0N3Kgh https://t.co/viLjJunNtQ"
## [6] "#April factory activity picks up, prices surge fastest in 7 years.\n#economy #India #Nifty #Investment #investing"
# Load libraries
library(wordcloud)
library(RColorBrewer)
library(wordcloud2)
library(tm)
# Create a vector containing only the text
text <- data$text
# Create a corpus
docs <- Corpus(VectorSource(text))
# Clean data
docs <- docs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
docs <- tm_map(docs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs, content_transformer(tolower)):
## transformation drops documents
docs <- tm_map(docs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, stopwords("english")):
## transformation drops documents
dtm <- TermDocumentMatrix(docs)
matrix <- as.matrix(dtm)
words <- sort(rowSums(matrix),decreasing=TRUE)
df <- data.frame(word = names(words),freq=words)
# Generate the word cloud
set.seed(1234)
wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
Let’s take a look at the total counts of new COVID-19 cases and deaths recorded each month in the U.S. This data has been reported by the U.S. Center for Disease Control in their data set: “United States COVID-19 Cases and Deaths by State over TimeCase Surveillance”. We have cleaned and organized daily COVID-19 reported data by state into national data by month.
## months dates newcases deaths
## 1 JAN20 2020-01-01 7 0
## 2 FEB20 2020-02-01 32 0
## 3 MAR20 2020-03-01 188138 3680
## 4 APR20 2020-04-01 875947 55007
## 5 MAY20 2020-05-01 725178 42053
## 6 JUN20 2020-06-01 847821 21634
## 7 JUL20 2020-07-01 1924412 28249
## 8 AUG20 2020-08-01 1470219 29184
## 9 SEPT20 2020-09-01 1217378 22327
## 10 OCT20 2020-10-01 1928056 24012
## 11 NOV20 2020-11-01 4400344 39397
## 12 DEC20 2020-12-01 6395806 79189
## 13 JAN21 2020-01-01 6098794 97095
## 14 FEB21 2020-02-01 2354530 63431
## 15 MAR21 2020-03-01 1773722 33041
Now, let’s look at how counts of new cases per month arose and fell from Jan ’20 to Mar ’21:
This bar plot helps us visually time line the spread COVID-19. The red bars denote the top five months which experienced the highest number of recorded cases, often referred to as peaks or “second, third” waves. We can infer that July ’20, Oct ’20, Nov ’20, Dec ’20, Jan ’21, and Feb ’21 had the highest number of recorded cases; from this data, we might predict that unemployment rates would be higher during these months and GDP would be lower.
Now, let’s look at how counts of new deaths due to COVID-19 per month arose and fell from Jan ’20 to Mar ’21:
This bar plot reports the months which recorded the highest number of deaths due to COVID-19; the bars in red signify the top five months with the highest amount of deaths. These months include: Apr ’20, May ’20, Dec ’20, Jan ’21, and Feb ’21. In correlation with the highest number of cases recorded, the months of Dec ’20 through Feb ’21 had both the highest number of cases and deaths. It is surprising that April and May of 2020 had two of the highest number of deaths although they didn’t have a peak in cases. However, Apr/May were early months of the pandemic where we had little knowledge about the virus, and their average case count is around 800K which might have propelled the peak in July of 1.9 million cases.
library(magrittr)
library(dplyr)
library(ggplot2)
library(plotly)
library(lubridate)
library(grid)
library(gridExtra)
library("tidyverse")
ur.df <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/Unemployment-Monthly.csv")
covid19.health <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/United_States_COVID-19_Cases_and_Deaths_by_State_over_Time.csv")
# Dropping unused columns
ur.cleaned.df <- subset(ur.df, select = -c(Series.ID, Year, Period))
# Renaming columns
colnames(ur.cleaned.df) <- c("Time_Period", "Unemployment_Rate", "Monthly_Change")
#Result
ur.cleaned.df
## Time_Period Unemployment_Rate Monthly_Change
## 1 2020 Jan 3.5 -2.8
## 2 2020 Feb 3.5 0.0
## 3 2020 Mar 4.4 25.7
## 4 2020 Apr 14.8 236.4
## 5 2020 May 13.3 -10.1
## 6 2020 Jun 11.1 -16.5
## 7 2020 Jul 10.2 -8.1
## 8 2020 Aug 8.4 -17.6
## 9 2020 Sep 7.8 -7.1
## 10 2020 Oct 6.9 -11.5
## 11 2020 Nov 6.7 -2.9
## 12 2020 Dec 6.7 0.0
## 13 2021 Jan 6.3 -6.0
## 14 2021 Feb 6.2 -1.6
## 15 2021 Mar 6.0 -3.2
# Dropping unused columns
covh.cleaned <- subset(covid19.health, select = c(submission_date, state, tot_cases, new_case, tot_death, new_death))
# Renaming columns
colnames(covh.cleaned) <- c("Date_Reported", "State", "Total_Cases", "New_Cases", "Total_Death", "New_Death")
# https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
covh.cleaned$Date_Reported <- mdy(covh.cleaned$Date_Reported) # returns year/month/day
# Result
head(covh.cleaned$Date_Reported)
## [1] "2021-04-01" "2020-10-15" "2021-03-16" "2021-04-16" "2020-02-14"
## [6] "2020-08-08"
# https://blog.exploratory.io/filter-with-date-function-ce8e84be680
# JANUARY 2020
jan20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported <= as.Date("2020-01-31")))
jan20.newcases <- sum(jan20.covid.df$New_Cases)
jan20.deaths <- sum(jan20.covid.df$New_Death)
# FEBRUARY 2020
feb20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-02-01") & Date_Reported <= as.Date("2020-02-28")))
feb20.newcases <- sum(feb20.covid.df$New_Cases)
feb20.deaths <- sum(feb20.covid.df$New_Death)
# MARCH 2020
mar20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-03-01") & Date_Reported <= as.Date("2020-03-31")))
mar20.newcases <- sum(mar20.covid.df$New_Cases)
mar20.deaths <- sum(mar20.covid.df$New_Death)
# APRIL 2020
apr20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-04-01") & Date_Reported <= as.Date("2020-04-30")))
apr20.newcases <- sum(apr20.covid.df$New_Cases)
apr20.deaths <- sum(apr20.covid.df$New_Death)
# MAY 2020
may20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-05-01") & Date_Reported <= as.Date("2020-05-31")))
may20.newcases <- sum(may20.covid.df$New_Cases)
may20.deaths <- sum(may20.covid.df$New_Death)
# JUNE 2020
jun20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-06-01") & Date_Reported <= as.Date("2020-06-30")))
jun20.newcases <- sum(jun20.covid.df$New_Cases)
jun20.deaths <- sum(jun20.covid.df$New_Death)
# JULY 2020
jul20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-07-01") & Date_Reported <= as.Date("2020-07-31")))
jul20.newcases <- sum(jul20.covid.df$New_Cases)
jul20.deaths <- sum(jul20.covid.df$New_Death)
# AUGUST 2020
aug20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-08-01") & Date_Reported <= as.Date("2020-08-31")))
aug20.newcases <- sum(aug20.covid.df$New_Cases)
aug20.deaths <- sum(aug20.covid.df$New_Death)
# SEPTEMBER 2020
sep20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-09-01") & Date_Reported <= as.Date("2020-09-30")))
sep20.newcases <- sum(sep20.covid.df$New_Cases)
sep20.deaths <- sum(sep20.covid.df$New_Death)
# OCTOBER 2020
oct20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-10-01") & Date_Reported <= as.Date("2020-10-31")))
oct20.newcases <- sum(oct20.covid.df$New_Cases)
oct20.deaths <- sum(oct20.covid.df$New_Death)
# NOVEMBER 2020
nov20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-11-01") & Date_Reported <= as.Date("2020-11-30")))
nov20.newcases <- sum(nov20.covid.df$New_Cases)
nov20.deaths <- sum(nov20.covid.df$New_Death)
# DECEMBER 2020
dec20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-12-01") & Date_Reported <= as.Date("2020-12-31")))
dec20.newcases <- sum(dec20.covid.df$New_Cases)
dec20.deaths <- sum(dec20.covid.df$New_Death)
# JANUARY 2021
jan21.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2021-01-01") & Date_Reported <= as.Date("2021-01-31")))
jan21.newcases <- sum(jan21.covid.df$New_Cases)
jan21.deaths <- sum(jan21.covid.df$New_Death)
# FEBRUARY 2021
feb21.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2021-02-01") & Date_Reported <= as.Date("2021-02-28")))
feb21.newcases <- sum(feb21.covid.df$New_Cases)
feb21.deaths <- sum(feb21.covid.df$New_Death)
# MARCH 2021
mar21.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2021-03-01") & Date_Reported <= as.Date("2021-03-31")))
mar21.newcases <- sum(mar21.covid.df$New_Cases)
mar21.deaths <- sum(mar21.covid.df$New_Death)
# Creating a variable for months
months <- c("JAN20", "FEB20", "MAR20", "APR20", "MAY20", "JUN20", "JUL20", "AUG20", "SEPT20", "OCT20", "NOV20", "DEC20", "JAN21", "FEB21", "MAR21")
# Creating a variable for new cases
newcases <- c(jan20.newcases, feb20.newcases, mar20.newcases, apr20.newcases, may20.newcases, jun20.newcases, jul20.newcases, aug20.newcases, sep20.newcases, oct20.newcases, nov20.newcases, dec20.newcases, jan21.newcases, feb21.newcases, mar21.newcases)
# Creating a variable for deaths
deaths <- c(jan20.deaths, feb20.deaths, mar20.deaths, apr20.deaths, may20.deaths, jun20.deaths, jul20.deaths, aug20.deaths, sep20.deaths, oct20.deaths, nov20.deaths, dec20.deaths, jan21.deaths, feb21.deaths, mar21.deaths)
# Time series
dates <- c("01/01/2020", "02/01/2020", "03/01/2020","04/01/2020", "05/01/2020","06/01/2020", "07/01/2020", "08/01/2020", "09/01/2020", "10/01/2020","11/01/2020", "12/01/2020", "01/01/2021", "02/01/2021", "03/01/2021")
# COVID-19 health per month dataframe
monthly.covid.df <- data.frame(months, dates, newcases, deaths)
monthly.covid.df <- monthly.covid.df %>%
mutate(dates = as.Date(dates, format = "%m/%d/%y"))
# Result
# grid.draw(tableGrob(monthly.covid.df, theme=ttheme_default(base_size = 6) ))
head(monthly.covid.df)
## months dates newcases deaths
## 1 JAN20 2020-01-01 7 0
## 2 FEB20 2020-02-01 32 0
## 3 MAR20 2020-03-01 188138 3680
## 4 APR20 2020-04-01 875947 55007
## 5 MAY20 2020-05-01 725178 42053
## 6 JUN20 2020-06-01 847821 21634
# Find the top 5 highest months
head(arrange(monthly.covid.df,desc(newcases)), n = 5)
# Color labels
top5.cases <- c("#69b3a2","#69b3a2","#69b3a2","#69b3a2","#69b3a2","#69b3a2",
"#B20000","#69b3a2","#69b3a2","#B20000","#B20000","#B20000","#B20000","#B20000","#69b3a2")
# Case labels
num.cases.mils <- c("7", "32", "188K", "875K", "725K","847K", "1.9m", "1.4m", "1.2m", "1.9m", "4.4m", "6.3m", "6m", "2.3m", "1.7m")
# Visualize
b.cases <- barplot(newcases, yaxp=c(0, max(newcases), 15),
ylim=range(pretty(c(0, newcases))), names=months, xlab = "Time Period", ylab = "Number of New Recorded Cases", main = "Number of Newly Recorded COVID-19 Cases per Month in the U.S. 20-21", col = top5.cases)
y<-as.matrix(newcases)
text(b.cases, y+300000,labels=num.cases.mils)
# Find the top 5 highest months
head(arrange(monthly.covid.df,desc(deaths)), n = 5)
# Color labels
deaths.top5 <- c("#90bcff","#90bcff","#90bcff","#B20000","#B20000","#90bcff",
"#90bcff","#90bcff","#90bcff","#90bcff","#90bcff","#B20000","#B20000","#B20000","#90bcff")
# Case labels
num.deaths.mils <- c(0, 0, '3,680', '55K', '42K,', '21K', '28K', '29K', '22K', '24K', '39K', '79K', '97K', '63K', '33K')
# Visualize
b.deaths <- barplot(deaths, yaxp=c(0, max(deaths), 15),
ylim=range(pretty(c(0, deaths))), names=months, xlab = "Time Period", ylab = "Number of New Recorded Deaths", main = "Number of Newly Recorded COVID-19 Deaths per Month in the U.S. 20-21", col = deaths.top5)
y<-as.matrix(deaths)
text(b.deaths ,y+2000, labels=num.deaths.mils)
The table below summarizes the unemployment rates and monthly change in unemployment per month from Jan ’20 to Mar ’21. New cases and deaths per month have been calcuated as a proportion of the total cases and deaths from Jan ’20 to Mar ’21 in order to create a line chart with all three variables.
## Date Unemployment_Rate Monthly_Change percentcases percentdeaths
## 1 JAN20 3.5 -2.8 0.00002317851 0.0000000
## 2 FEB20 3.5 0.0 0.00010595892 0.0000000
## 3 MAR20 4.4 25.7 0.62296558878 0.6836349
## 4 APR20 14.8 236.4 2.90044987507 10.2186703
## 5 MAY20 13.3 -10.1 2.40122112355 7.8122010
## 6 JUN20 11.1 -16.5 2.80731860893 4.0189560
## 7 JUL20 10.2 -8.1 6.37214414227 5.2478270
## 8 AUG20 8.4 -17.6 4.86821293398 5.4215222
## 9 SEPT20 7.8 -7.1 4.03100172501 4.1476949
## 10 OCT20 6.9 -11.5 6.38421021402 4.4607179
## 11 NOV20 6.7 -2.9 14.57049022953 7.3187949
## 12 DEC20 6.7 0.0 21.17789628105 14.7109692
## 13 JAN21 6.3 -6.0 20.19442534241 18.0373733
## 14 FEB21 6.2 -1.6 7.79635782115 11.7835998
## 15 MAR21 6.0 -3.2 5.87317697682 6.1380385
Now let’s take a look at correlations between unemployment rates, cases, and deaths:
SUMMARY
SUMMARY
# Create a new unemployment dataframe
ur.viz <- ur.cleaned.df
# Renaming columns
colnames(ur.viz) <- c("Date", "Unemployment_Rate", "Monthly_Change")
# Cleaning and wrangling time/date data
ur.viz$Date <- months
ur.viz$Date <- factor(ur.viz$Date, levels = c("JAN20", "FEB20", "MAR20", "APR20", "MAY20", "JUN20", "JUL20", "AUG20", "SEPT20", "OCT20", "NOV20", "DEC20", "JAN21", "FEB21", "MAR21"))
# Add COVID-19 cases data, calculate contribution percentage from each month to total cases
options(scipen = 999)
vector = c()
for (i in 1:length(newcases)) {
vector <- c(vector, (newcases[i]/sum(newcases)*100))
formatC(vector[i], digits = 1)
}
ur.viz$percentcases <- vector
# Add COVID-19 deaths data, calculate contribution percentage from each month to total deaths
vector.d = c()
for (i in 1:length(deaths)) {
vector.d <- c(vector.d, (deaths[i]/sum(deaths)*100))
formatC(vector.d[i], digits = 1)
}
ur.viz$percentdeaths <- vector.d
# Result
head(ur.viz)
## Date Unemployment_Rate Monthly_Change percentcases percentdeaths
## 1 JAN20 3.5 -2.8 0.00002317851 0.0000000
## 2 FEB20 3.5 0.0 0.00010595892 0.0000000
## 3 MAR20 4.4 25.7 0.62296558878 0.6836349
## 4 APR20 14.8 236.4 2.90044987507 10.2186703
## 5 MAY20 13.3 -10.1 2.40122112355 7.8122010
## 6 JUN20 11.1 -16.5 2.80731860893 4.0189560
# Creating the graph
ggplot(ur.viz, aes(x=Date, group = 1)) + geom_line(aes(y = Unemployment_Rate, group = 1), color = "darkred") + geom_line(aes(y = percentcases, group = 1), color="orange", linetype="twodash") + geom_line(aes(y = percentdeaths, group = 1), color="darkgreen", linetype="twodash") + labs(x = "Time Period", y = "Percentage", caption="Source: Center for Disease Control (CDC) and U.S. Bureau of Economic Analysis (BEA)") + ggtitle("Unemployment Rate Trends in the U.S. 2020-21")
ggplot(ur.viz, aes(x=Date, group = 1)) + geom_line(aes(y = Monthly_Change, group = 1), color = "darkred") + geom_line(aes(y = percentcases, group = 1), color="orange", linetype="twodash") + geom_line(aes(y = percentdeaths, group = 1), color="darkgreen", linetype="twodash") + labs(x = "Time Period", y = "Change Percentage", caption="Source: Center for Disease Control (CDC) and U.S. Bureau of Economic Analysis (BEA)") + ggtitle("Monthly Change in Unemployment Rates in the U.S. 2020-21")
SUMMARY
SUMMARY
SUMMARY
# Load in necessary libraries
library(leaflet)
library(maptools)
library(rgeos)
library(rgdal)
# This dataset shows the percent change in GDP for each state in the United States from 2017 to 2020.
gdp.state.df <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/GDP-State.csv")
state.df <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/GDP-State.csv")
# This dataset shows the percent change in GDP for each industry in each state in the United States from 2019 to 2020.
gdp.industry.df <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/GDP-Industry.csv")
# This is the shapefile for each state
states <- readOGR(dsn = "C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets", layer = "cb_2018_us_state_500k")
## Warning in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv
## = use_iconv, : ogrInfo: C:\Users\Rinija\Documents\GitHub\economic-impact-of-
## covid19\Datasets/cb_2018_us_state_500k.dbf not found
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Rinija\Documents\GitHub\economic-impact-of-covid19\Datasets", layer: "cb_2018_us_state_500k"
## with 56 features
## It has 0 fields
# Renaming columns
colnames(gdp.state.df) <- c("State", "Y2017", "Y2018", "Y2019", "Y2020", "Rank-Y2020")
# Removing spaces from all rows and pivoting the data to be in long format
gdp.state.df <- gdp.state.df %>%
mutate(across(where(is.character), str_remove_all, pattern = fixed(" "))) %>%
pivot_longer(cols = Y2017:Y2020, names_to = "Year", values_to = "Change")
# California Subset
ca.gdp <- gdp.state.df[(gdp.state.df$State == "California"), ]
# Texas Subset
tx.gdp <- gdp.state.df[(gdp.state.df$State == "Texas"), ]
# Rhode Island Subset
ri.gdp <- gdp.state.df[(gdp.state.df$State == "RhodeIsland"), ]
# Utah Subset
ut.gdp <- gdp.state.df[(gdp.state.df$State == "Utah"), ]
CA <- ggplot(ca.gdp %>% mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
geom_bar(stat="identity")+
scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
ggtitle("Percent Change in GDP from 2017-2020 in California")+
geom_hline(yintercept = 0)
TX <- ggplot(tx.gdp %>% mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
geom_bar(stat="identity")+
scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
ggtitle("Percent Change in GDP from 2017-2020 in Texas")+
geom_hline(yintercept = 0)
RI <- ggplot(ri.gdp %>% mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
geom_bar(stat="identity")+
scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
ggtitle("Percent Change in GDP from 2017-2020 in Rhode Island")+
geom_hline(yintercept = 0)
UT <- ggplot(ut.gdp %>% mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
geom_bar(stat="identity")+
scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
ggtitle("Percent Change in GDP from 2017-2020 in Utah")+
geom_hline(yintercept = 0)
label_2020 <- paste0(
"<b>2020:</b> ", state.df$X2020, "<br>"
)
paletteBins <- c(-8, -6, -4, -2, 0)
colorPalette <- colorBin(palette = "RdYlGn", domain = state.df$X2020, na.color = "transparent", bins = paletteBins)
leaflet(states) %>%
addTiles() %>%
setView(lat = 39.8097, lng = -98.5556, zoom=4) %>%
addPolygons(
stroke = TRUE,
fillColor = ~colorPalette(state.df$X2020),
color = 'Black',
weight = 1.5,
label = ~lapply(label_2020, htmltools::HTML)) %>%
addLegend(pal = colorPalette, values = state.df$X2020, opacity = 0.9, title = "GDP Percent Change in 2020", position="bottomleft")
label_2019 <- paste0(
"<b>2019:</b> ", state.df$X2019, "<br>"
)
paletteBins <- c(-1, 0, 1, 2, 3, 4, 5, 6)
colorPalette <- colorBin(palette = "Greens", domain = state.df$X2019, na.color = "transparent", bins = paletteBins)
leaflet(states) %>%
addTiles() %>%
setView(lat = 39.8097, lng = -98.5556, zoom=4) %>%
addPolygons(
stroke = TRUE,
fillColor = ~colorPalette(state.df$X2019),
color = 'Black',
weight = 1.5,
label = ~lapply(label_2019, htmltools::HTML)) %>%
addLegend(pal = colorPalette, values = state.df$X2019, opacity = 0.9, title = "GDP Percent Change in 2019", position="bottomleft")
United States COVID-19 Cases and Deaths by State over Time Dataset from the U.S. Center for Disease Control and Prevention.
Labor Force Statistics from the Current Population Survey Dataset from the U.S. Bureau of Labor Statistics.
Gross Domestic Product by State, 4th Quarter 2020 and Annual 2020 (Preliminary) from U.S. Bureau of Economic Analysis.
Interactive Choropleth Maps from R Journalism.
R and Leaflet to create interactive choropleth maps from Towardsdatascience.
Filter with Date data from Exploratory.io.
R Markdown: The Definitive Guide from Bookdown.
Lesson 2. Twitter Data in R Using Rtweet: Analyze and Download Twitter Data from EarthLab.
How to Generate Word Clouds in R from Towardsdatascience.
We have created a GitHub Repository with our entire codebase and data sets for future reference.
Please note: a few of our graphs may get cut off once we knit the Rmd file due to spacing limits, however, we have saved PNG images of our graphs and added them to the ‘Graph-Images’ folder in our GitHub Repository so you can see the entire image!